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Cooperativity is a hallmark of proteins, many of which show a modular architecture 
comprising discrete structural domains. Detecting and describing dynamic couplings 
between structural regions is difficult in view of the many-body nature of protein- 
protein interactions. By utilizing the GPU-based computational acceleration, we 
carried out simulations of the protein forced unfolding for the dimer WW—WW of 
the all-/3-sheet WW domains used as a model multidomain protein. We found that 
while the physically non-interacting identical protein domains fWW) show nearly 
symmetric mechanical properties at low tension, reflected, e.g., in the similarity of 
their distributions of unfolding times, these properties become distinctly different 
when tension is increased. Moreover, the uncorrelated unfolding transitions at a 
low pulling force become increasingly more correlated (dependent) at higher forces. 

Hence, the applied force not only breaks “the mechanical symmetry” but also cou¬ 
ples the physically non-interacting protein domains forming a multi-domain protein. 

We call this effect “the topological coupling”. We developed a new theory, inspired 
by Order statistics, to characterize protein-protein interactions in multi-domain pro¬ 
teins. The method utilizes the squared-Gaussian model, but it can also be used in 
conjunction with other parametric models for the distribution of unfolding times. 

The formalism can be taken to the single-molecule experimental lab to probe me¬ 
chanical cooperativity and domain communication in multi-domain proteins. 
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INTRODUCTION 


Mechanical forces play an important role in life processes, and many biological molecules are 
subjected to forces. There are numerous examples from biology, where mechanical forces play an 
essential role in a physiological context. Tensile forces acting on cells or bacteria originate from 
the dragging forces imposed by the fluid flow [l| . A function of a biological motor might result in 
the generation of large pulling force. For example, active force from myosin generates substantial 
mechanical stress on structures of the sarcomere [ 2 , 3]. Mechanical forces can also be generated 
within the biological system. For example, leukocytes that patrol the blood flow in search 
of path^ens, need to generate internal force to squeeze into in and pass through connective 
tissues In addition to biochemical stimuli, cellular processes involving the cytoskeleton can 
be modulated in response to external force j^. 

Structural arrangements of proteins have evolved in response to the selection pressure from 
biological forces. The three-dimensional structures often exhibit a modular architecture com¬ 
posed of discrete structural regions. These include semi-static structures such as actin filaments 

riQ 

and microtubules, and flexible structures such as muscle protein titin and fibrin clot [SHTJ. 
Titin, a giant l/rm protein formed by linked immunoglobulin and fibronectin domains defines 
the structure and elasticity of muscle sarcomere { 2 , 6|. Proteins that form or interact with the 
extracellular matrix (ECM) have modular or multidomain architecture j^, [^. These include 
fibronectin fibrils, which form meshworks around the cell, and integrins, which link the cell ex- 


10| . Fibrin polymerization in blood results in formation 


ternal and intracellular environment 
of branched network called a fibrin clot, which must sustain large shear stress due to blood flow 

nn 

[7|, lll|. Protein shells of plant and animal viruses (capsids) are often made of multiple copies of 
a single structural unit (capsomer). For example, the capsid of Cowpea Chlorotic Mottle Virus 
is an icosahedral shell comprised of 180 copies of a single (190 amino acid) protein 12|. 

Physical properties of proteins have adapted to biological forces. In ECM assembly, fi¬ 
bronectin subunits change conformations from compact to extended. The extent of assembly 


is controlled by cell contractility through actin filaments 


13| . Hence, tissue tension regulate 


matrix assembly. Virus capsids should be stable enough to protect encapsulated material (DNA 
or RNA), yet, unstable to release the material when invading their host cells 1^. Hence, me¬ 
chanical properties of virus capsids are important factors in viruses’ survival in the extracellular 
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environment and cell infectivity. Cell adhesion and migration rely on reversible changes in 
mechanical properties of cells. Spatial distribntion of internal tension in cell remodeling, de¬ 
rived from ^namic regnlation of contractile actin-myosin networks, correlates cell shape and 
movement |3|, [isf . In addition, many proteins have evolved to act as “force sensors” to convert 
tension-indnced conformational changes into biological signal lOj]. For example, tissne deforma¬ 


tion dnring mechanical stimnlation alters the conformation of ECM molecnles. Fi 


in mechanotransdnction alter their strnctnre in response to external tension 1^. Proper com- 


amins involved 


mnnication between strnctnral regions in mnltidomain proteins is important for regnlation. For 
example, dynamic conpling between the regnlatory domain and the ligand-binding domain in 
P-selectins is implicated in formation of force-activated bonds (“catch bonds” 181) with 
their ligands (PSGL-1). 

□ □□ 

Single-molecnle experimental techniqnes snch as Atomic Force Microscopy (AFM) [1|, [191, [2^ 
and optical trap [19|, l2]| , have made it possible to stndy life processes at the level of individnal 
molecnles. These experiments, which ntilize mechanical force to nnfold proteins or to dissociate 
protein-protein complexes, have made it possible to probe the nnfolding or nnbinding transi¬ 


tions one at a time 


22| . In the context of protein forced nnfolding, grabbing the molecnle at 


specihc positions allows one to select specihc region(s) of the molecnle and to dehne the nnfold¬ 
ing reaction coordinate. Mechanical manipnlation continnes to provide a nniqne approach to 
qnantify the nnfolding transitions in terms of the measnrable qnantities - the nnfolding forces 
(constant velocity or force-ramp experiment) and the nnfolding times (constant force or force- 
clamp experiment). Consider a force-clamp experiment on a mnltimeric protein Dn formed by 
head-to-tail connected identical domains (T*’s). The protein is snbjected to the external pnlling 
force, and each nnfolding transition canses an increase in the chain length, AY. As a resnlt, the 
total length of the polypeptide chain Y shows the characteristic pattern of stepwise increases 
(Ay’s), which mark seqnential nnfolding transitions in protein domains. 

The goal of statistical analysis and modeling of force spectroscopy data is to obtain accnrate 
information abont the physical characteristics of protein domains. Yet, dne to inherent limita¬ 
tions in the cnrrent experimental resolntion, there is no direct way of attribnting the times of 
transitions to the protein domains or strnctnral regions where these transitions have occnrred. 
We only have partial or incomplete observation of the system. Said differently, it is not possible 
to determine which specihc domain has nnfolded at any time since any domain can nnfold at any 
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given time. Consider the simplest case of the dimer WW—WW of the WW domain, presented 
in Fig. 1. There are two possible scenarios of nnfolding. In the hrst pathway, the hrst domain 
(WWi) nnfolds hrst and the second domain {WW 2 ) nnfolds second. In the second pathway, the 
order of nnfolding is reversed. In an experimental measnrement, what is being recorded is the 
hrst nnfolding time (ti: 2 ) and the second nnfolding time (^ 2 : 2 ) in a seqnence of two observations, 
and it is impossible to tell which domain has nnfolded hrst or second. Hence, the hrst nnfolding 
times (ti: 2 ) and the second nnfolding times (^ 2 : 2 ) involve contribntions from the hrst domain 
VFhFi, which nnfolds at time fi, and the second domain hFH^, which nnfolds at time ^ 2 - Can 
we obtain domain-specihc information from the experimental data? 

The hrst step is to nnderstand the natnre of the random variables measnred. In a constant 
force experiment on an n-domain protein {Di—D 2 —- ■ -Dn), individnal domains nnfold one after 
another bnt any domain can nnfold at any given time. The observed nnfolding times are ordered, 
i.e. they comprise a set of ordered time variates, also known as Order statistics 23|, l2^. Hence, 
what is being measnred are the “time-ordered data” ti.,n,t 2 -n, ■ ■ ■ where ty,n is the r-th 
nnfolding time (r=l, 2,..., n) (in a seqnence of n observations). These are diherent from the 
(hidden) “parent data” ti,t 2 , ■ ■ ■ ,tn with being the nnfolding time of the i-th. domain (Dj, 
i=l, 2,..., n), which contain information abont the individnal protein domains (Di, ^ 2 , • • •, Dn)- 
Hence, the qnestion becomes - can we solve an inverse problem, namely, can we perform the 
inference of the parent distribntions of nnfolding times from the distribntions of (observed) 
ordered nnfolding times? 

The r-th order statistic is characterized by the cnmnlative distribntion fnnctions (cdf’s) 
Pr-n(t), and the probability density fnnctions (pdf’s) Pr-.nif) of the r-th nnfolding time, 
r=l, 2,..., n, in a seqnence of n observations (for n domains). The r-th order statistic cdf Pr-.n{t) 
is the probability that the r-th nnfolding time tr does not exceed t, i.e., Pr-.n{t)=Prob{tr < t), and 
the r-th order statistic pdf is Pr:nif)=dPr-.nit)/dt |23|. In onr example for the dimer WW—WW, 
the hrst nnfolding time (ti: 2 ) and the second nnfolding time (^ 2 : 2 ) both carry information abont 
the nnfolding times for the hrst domain WWi (ti) and for the second domain WW 2 (^ 2 )- fa 
general, becanse the cdf Pr-.nif) and the pdf Pr-.nif) of the r-th order statistic depend on the 
parent cdf Pi{t) and pdf pi{t) (i=l, 2,..., n), it is possible, at least in principle, to resolve Pi{t) 
and pi(t) from the cdf a nd pdf of the order statistics, Pr-.nif) and Pr-.nit). 

26| . we nsed Order statistics to solve the inverse problem for the in- 


In onr recent papers 
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dependent identically distributed {iid) random variables and for the independent non-identically 
distributed {inid) random variables. The formalism presented in these papers can be used to 
describe the physically non-interacting identical protein domains (T*’s) forming a multimeric pro¬ 
tein D—D—.. .—D {iid case), and the non-interacting non-identical domains *=1, 2,..., n) 

forming a multi-domain protein D 1 —D 2 —.. - —Dn {inid case) 25|. We have designed rigorous 
statistical tools for assessing the independence of the parent forced unfolding times (t*) and 
the equality of the parent pdfs of unfolding times {pi{t)) from the observed ordered unfolding 
times {tr:n) [261 . These statistical tests can be utilized to classify the parent unfolding times 
and to detect correlated unfolding transitions in multi-domain proteins using experimental force 
spectroscopy data. We have also extended Order statistics approach to describe the unfold¬ 
ing forces measured in the constant-velocity (force-ramp) experiments j^. Order statistics 
inference methods have been applied successfully to inverse problems involving, e.g., partialy 
observed queuing systems [281-130 1. 

Here we take a step further and present a new theory inspired by Order statistics, to solve 
the inverse problem for the dependent identically distributed {did) random variables and for 
the dependent non-identically distributed {dnid) random variables, using a squared-Gaussian 
parametric model of the distributions of unfolding times. The developed formalism can be 
used to describe the mechanical behavior of interacting (coupled) identical protein domains 
in a multimeric protein D—D—...—D {did case), and interacting non-identical domains in a 
multidomain protein D 1 —D 2 —...—{dnid case). In the next Section, we describe our method. 
We focus on the order statistics pdf’s, since these statistical measures can be easily estimated by 
constructing the histograms of unfolding times. We establish a relashionship between the order 
statistics pdf’s and the parent pdf’s. In Section III, we describe the Self Organized Polymer 
(SOP) model of the all-/3-sheet WW domain and Langevin simulations of the forced unfolding 


of the dimer WIV—WW used as a mode 
single-molecule measurements in vitro 31 


system (Fig. 1). The in silico experiments mimic 


34l |. In section IV, we perform a direct statistical 


analysis of the simulation output for dimer WW—WW and describe the effects of “mechanical 
symmetry breaking” and “topological coupling”. In Section V, we compare several statistical 
measures of the ’’parent data” for each WW domain - the distribution of unfolding times, the 
average unfolding time, the standard deviation, and the skewness of distribution, and Pearson 
correlation coefficient, with the same measures obtained by applying Order statistics inference 
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to the “time-ordered data”. We discuss our results in Section VI. 


II. ORDER STATISTICS INFERENCE 


Consider a vector of order statistics T'=[ti.„, t 2 :n, • • • ,tn:nY, in which the entries obtained in 
a single measurement correspond to the 1-st unfolding time (ti:n), 2-nd unfolding time (t 2 :n), 

..., and n-th unfolding time (tn-.n), the symbol “dagger” (f) represents vector transpose (tr:n 
denotes the r-th unfolding time out of n times for a protein D 1 —D 2 —.. -—Dn of n domains 
Di, D 2 ,..., Dn). This same observation T' can be also obtained by rearranging the components 
of another vector T=[ti,t 2 , ■ ■ ■ ,tn]^ in the order of increasing time variates, which represents 
the unfolding times of the 1-st domain Di (ti), 2-nd domain D 2 (^ 2 ), • • ■, and n-th domain 
Dn (tn) of the same multidomain protein D 1 —D 2 —.. -—Dn- This is because any domain can 
unfold at any given time with a non-zero probability and, hence, there are multiple unfolding 
scenarios. Hence, on the one hand, we observe the “time-ordered data” {tr-.n), and on the other 
hand, we have (hidden) ’’parent data” {ti). We call this transformation map G and write 
T'=G(T) = [fi:„, t 2 :n, • • •, ^n:n]^ wheie tr:n IS the r-th Smallest component of vector T. 

Let vector T' have the joint pdf PT'{ti-.n,t 2 :n, ■ ■ ■, tn-.n) and let vector T have the joint pdf 
PT{ti,t 2 , ■ ■ - tn)- We seek to establish a relationship between the pdf of the ordered data, 
PT'{ti:n,t 2 -.n, ■ ■ ■ ,tn:n), and the pdf of the parent data, Pt(H,^ 2 , ■ • • ^n)- Let us hrst assume, 
that vector T can be written as T=[fi, t 2 , ■ ■ ■, tnY=[x‘l, ..., where X=[a:i, 0 : 2 ,..., is 
an arbitrary random vector sampled from the Gaussian distribution 


Px(a;i,X2, ...,Xn) 


(27r)V2^| 


( 1 ) 


where p=[pi, P 2 , ■ ■ ■, Pn]^ is the vector of the mean, and is the covariance matrix of 

vector X. Here, |S| denotes the determinant of matrix S. We use squared-Gaussian distribution 
since it yields one-sided exponential tail behavior and a non-zero skewness (typical of Gamma- 
process), but unlike Gamma distribution, sqnared-Gaussian is simple to describe correlations of 
the data. 

We next obtain the joint probability density function of the parent data T in terms of the 
random variable X, by using the following general transformation formula for the joint pdf of 
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the multivariate random variable: 


PT{ti,t2, ...,tn)= . . . ,tn))\\j\ti,t2, . . • ,tn)|| 

I 


( 2 ) 


which is valid for a smooth many-to-one mapping from real space to real space with the 
non-zero Jacobian of the inverse mapping. In Eq. ([2]), the sum runs over all inverse branches, 


■ ■ ■ ,tn), with I being the l-th branch, and ||J^|| = 


det 


ad:(t) 

dti 


i,j=^ 


is the absolute 


value of the Jacobian of the l-th inverse branch. The total number of inverse branches for 
the transformation ti=x‘^ (i=l, 2,..., n) is 2"', and they can be written as x\=s\y/U, s^=±l, 
1=1, 2,..., 2"’ (i=l, 2,..., n). The Jacobian matrix for this transformation is given by a diagonal 
matrix, with the i-th diagonal entry given by Ji = 2 ^- determinant of this matrix is 

I Jl=nr=i 2 ^' Substituting the expressions for px and |J^| in Eq. ([2]), we arrive at the joint 
pdf of the parent data expressed in terms of the pdf of vector X; 


Pritl, t2, ■ ■ ■ , tn) 


1 

(27r)-/2v^ 


1 

■t 2 ---tn 


g-i(sVT-/.)tS-l(sVT-/.) 


( 3 ) 


where the sum varies over all vectors of signs s=[si, S2, ■ ■ ■, with the entries Sj=±l, and 
s\/T=[si\/^, 82 ^/%,..., is the vector of inverse transformation written in terms of the 

components (i=l, 2 ,... ,n). 

The hnal step is to obtain the pdf of the time-ordered data, PT'(ti:n,t 2 :n, ■ ■ ■ ,tn:n), in terms 
of the pdf of the parent data using Eq. (|2]). Consider the map G, and use this map in 
Eq. (j2]). Since G orders the components of vector T to produce the order statistics vector 
T'=[fi:n, ^ 2 :n, • • • ,tn:nY, the iuverse branches of G are all possible permutations, k, of the com¬ 
ponents of vector T'. In the case of n domains, there are a total of n\ possible rearrangements 
of n components and, hence, n\ of the inverse branches of G. Since the Jacobian of the per¬ 
mutation transformation is either J-l or —1, the formula connecting the joint pdf of the order 
statistics data and the joint pdf of the parent data reads: 


PT (^1:71) ^2:n) ■ ■ ■ i ^n:n) / ^ Px (^K(l):n) ^K(2):n) • • • ) ^K(n):n) 

K, 

(27r)-/2^^ 2^y/R,n-t2:n---tn-.n^^ 

where the symbolic summation is performed over all possible permutations of components of 
vector T', denoted as k, and S 2 y^b< 2 ^, • • •, is a vector of inverse 

















branches of transformations ti=xf and ordering G. The expression in the second line of Eq. (jl]) 
has 2 n + n{n — l )/2 unknown parameters, including n components of the vector of the mean 
(yU=[/ri, yU. 2 ,..., and {in? + n)/2 entries of the symmetric covariance matrix {?^=[o'ij\'ij=i)■ 

One can use any estimation method to determine these parameters. Given the statistics of Xi 
(/ij and (Tjj), we can obtain the pdf (pi(t)), the mean (/rf), the variance (erf), the skewness of 
distribution ( 7 ^^) for (i=l, 2 ), and estimate pair-wise correlations (p^) using the analysis in 
the Appendix, which treats the case of i, j=l, 2 . 


III. FORCE-CLAMP MEASUREMENTS IN SILICO 


A. Computer model of dimer WW—WW 


We used the Gc-based Self Organized Polymer (SOP) model of the polypeptide chain [35 1 
to describe the dimer WW—WW formed by the all-/9-sheet WW domai ns ( Fig. 1). The WW 


domain has 34 amino acid residues (Protein Data Bank (PDB) entry IPIN 3^). The mechanical 
unraveling of WW is described by the single-step kinetics of unfolding, F ^ U, from the folded 
state F to the unfolded state U. The dimer WW—WW was constructed by connecting the N- 
and C-termini of the adjacent WW domain using flexible linkers of two and four neutral residues 
(Fig. 1). Each residue in WW—WW was represented by its Ga-atom with the Gq—Gq covalent 
bond distance of a=3.8A (peptide bond length). 

The molecular potential energy of a protein conformation, specihed in terms of the residue 
coordinates {rj}, i=l, 2 ,..., M, is given by 


Vmol — Vfene + fwB + 

=-f: log (1 

i=i \ 

M-3 M 




i=l j=i+3 
M-2 


W] -2(^ 








2=1 


(J 


D,i+1 


6 M-3 M \ 6 


i=l j=i+3 


V 


( 5 ) 


where the distance between any two interacting residues i and i+1 is whereas rT+i is its 

value in the native structure. The hrst term in Eq. ([5]) is the FENE potential, which describes the 
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chain connectivity; Ro=2A is the tolerance in the change of a covalent bond {k=1.4:N/m). The 
second term is the Lennard-Jones potential which accounts for the native interactions. 

We assumed that if the non-covalently linked residues i and j (|i — j\>2) are within the cutoff 
distance in the native state rc=8.0A, then Ajj=l and zero otherwise. We used a uniform value 
of 6/1=1.5/cca//mo/, which specihes the strength of the non-bonded interactions. All the non¬ 
native interactions were treated as repulsive (V/vs^)- additional constraint was imposed 
on the bond angle between residues i, i+1, and i+2 by including the repulsive potential with 
parameters ei=lkcal/mol and cr=3.8A, which quantify the strength and the range of repulsion. 
To ensure the self-avoidance of the protein chain, we set (T=3.8A. 


B. Simulations of forced unfolding of dimer WW—WW 


The unfolding dynamics were obtained by integrating the Langevin equations for each particle 
position Tj in the over-damped limit, rjdri/dt=—dV/dri+gi{t) . Here, V=Vmol—'^'^ is the total 
potential energy, in which the hrst term (Vmol) is the molecular contribution (see Eq. ([5])) and 
the second term represents the influence of applied force on the molecular extension Y. Also, 
g(t) is the Gaussian distributed random force and rj is the friction coefficient. In each simulation 
run, the N-terminal Go-atom of the hrst domain {WWi) was constrained and a constant force 
f=/n was applied to the C-terminal Go-atom of the second domain {WW 2 ) in the direction n 
of the end-to-end vector of the dimer Y (see Fig. 1). The Langevin equations were propagated 
with the time step At=0.08r//-=20ps, where TH=C^h‘TL/kBT. Here, TL={ma?/ehY^'^='ips, C=50 
is the dimensionless friction constant for a residue in water {r]=(m/TL), and m^3xl0~‘^^g is 
the residue mass Pulling simulations were carried out at room temperature using the bulk 
water viscosity, which corresponds to the friction coefficient ri=7.0xl0^pNps/nm. We utilized 


the GPL 
data I 38 


based acceleration to generate the statistically representative sets of the unfolding time 


39] . For each value of constant force /=80, 100, 120, 140 and 160pN, we generated 


two sets of trajectories with 1, 000 trajectories in each set: one set for dimer of WW domains 
connected by the linker of two neutral residues, and the other for the four-residue linker. The 
unfolding time for each domain (parent statistics) was dehned as the hrst time at which the end- 
to-end distance of the domain exceeded 90% (~11.25nm) of its contour length L=33a~12.5nm. 

Representative trajectories of the total end-to-end distance for the dimer WW—WW ob- 
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tained at f=100pN and /=160piV are compared in Fig. 2, which also shows graphically the 
dehnition of the hrst unfolding time ti -2 and the second unfolding time ^ 2 : 2 - We observe a typ¬ 
ical “unfolding staircase”, i.e. a series of sudden step-wise increases in the end-to-end distance 
of WW—WW as a function of time. These mark the consecutive unfolding transitions in WW 
domains, which occur at the hrst unfolding time ti :2 ( 1 -st unfolding event) and at the second 
unfolding time ^ 2:2 (2-nd unfolding event). The unfolding transitions are more discrete at a lower 
force /=100pA^ (Fig. 2a) but more continuous at a higher force /=160piV (Fig. 2b). We re¬ 
mind that in experiment (but not in simulations), there is no way of knowing which domain has 
unfolded at any given time, and the time-ordered data {tr-.n) are the only observable quantities. 


IV. TOPOLOGICAL COUPLING AND MECHANICAL SYMMETRY BREAKING 


To provide a basis for the Order statistics inference, we performed a direct statistical analysis 
of the simulaiton output for WW—WW generated at /=80, 100, 120, 140, and lOOpiV (parent 
data). The data sets contain Q=l, 000 unfolding times for each force value and for each linker 
length. First, we constructed the histogram-based estimates of the unfolding times for the 
hrst domain {WWi) and second domain {V/W 2 ), which are compared in Fig. 3 (bin size was 
chosen using the Freedman-Diaconis rule 40|). We calculated the average unfolding times pj 
and P 2 *= 152 ), the standard deviations af and aj {ai={E{tl) — (pf)^)^'^^, 

where Eit"^) is the second moment of f*, i=l, 2 ), and the values of the skewness of distributions 
'jf and 7 J, and the Pearson correlation coefhcient p^=Pi 2 - The skewness of the distributions 
was calculated using the formula 'j'[={E{tf) — 3/if(crf)^ — (/if)^)/(c’‘f)^, where E(tf) is the 
third moment of ti (f=l,2). Pearson correlation coefhcient was calculated as p^={E{tit 2 ) — 
/if/i|’)/(TfcrJ, where E{tit 2 ) is the hrst moment of the product ^ 1 ^ 2 - These statistical measures 
are compared in Table I and II for the case of the linker of two and four residues, respectively. 

The parent distributions of unfolding times are skewed, broadly distributed, and exponential- 
like at low forces {f=80pN) and become more narrowly distributed, and Gaussian-like at high 
forces (/=160piV; see Fig. 3). These changes become manifest when comparing the values 
of /if, af, and yf, i=l,2 (see Tables I and II). Here, /if and /if decrease as / is increased, 
because the applied force destabilizes the native state of WW, and, hence, decreases their 
lifetimes. At low forces, dynamic huctuations are not suppressed and the unfolding transitions 
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are more variable (stochastic), which is reflected in the width of the distributions {crj'). When / 
is increased, fluctuations become less important and the unfolding events become increasingly 
more deterministic (less stochastic), which results in the decrease of aj and aj. 

Interestingly, we found that the unfolding times ti for the first domain iWWi) and ^2 for the 
second domain {WW 2 ) are uncorrelated at low forces, but become correlated (dependent) at a 
high force (/=160piV). This is reflected in the values of the correlation coefficient (Table I 
and II). Hence, our results show that tension couples otherwise non-interacting protein domains 
forming a multi-domain protein. Because this result could have been observed in the case of 
physically interacting protein domains, e.g., through a common binding interface, we termed 
this effect the “topological coupling” to stress the importance of chain connectivity (topology). 
Hence, our results indicate that under large mechanical stress protein domains might become 
topologically coupled even in the absence of domain-domain interactions. 

Another interesting finding is that although the unfolding times for the first domain WWi (ti) 
and for the second domain WW 2 (^ 2 ) are similarly distributed at a low force {f=80pN), these 
become more and more nonidenticaly distributed at higher forces (/=100—160pA^). Indeed, the 
values of pj, af, and yf for i=l, 2 are very similar at f=S0pN, yet, very different at /=160piV 
(Tables I and H). Hence, our results indicate that increased tension breaks the mechanical 
symmetry. Indeed, identical protein domains WWi and WW 2 have very similar mechanical 
properties at a low force, but these become distinctly different at higher forces. 


V. FROM TIME-ORDERED DATA TO PARENT DISTRIBUTIONS 

In force-clamp single-molecule experiments on multidomain proteins, experimentalists have 
no prior knowledge regarding the type of random variables measured. Our results for the dimer 
WW—WW indicate that the mechanical force can topologically couple the non-interacting 
protein domains and can break the similarity in their physical properties even when domains 
are identical. Using our classification of random variables, the unfolding times for identical 
protein domains in a multi-domain protein might form a set of iid random variables at low 
forces, inid random variables at the intermediate force level, and dnid random variables at high 
enough forces. Hence, a unified approach is needed to analyze and model the different types 
of random variables. Here, we describe the results of application of Order statistics inference. 
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developed in Section II, to characterize the forced nnfolding times for the dimer WW—WW 
obtained for different valnes of applied constant force /=80—160piV. The formalism adapted 
for the two-domain protein is presented in the Appendix. 

Because in a single-molecule experiment size of the data sample might be small, we picked 
at random 330 data points (330 pairs (^ 1 ,^ 2 )) out of Q=1,000 observations from each data set. 
Next, we ordered the data for each pair to generate ordered unfolding times as observed in 
experiment. As an example for the dimer WW—WW (linker of two residues), we present the 
histogram-based estimates of the pdf’s of the 1 -st unfolding time (^ 1 : 2 ) and the 2 -nd unfolding 
time (^ 2 : 2)5 obtained at /=80, 120, and 160pN in Fig. 4. The histograms of the ordered un¬ 
folding times are markedly different in terms of their overall shape, width, and position of the 
maximum (most probable unfolding time). That the maximum for ^ 2:2 corresponds to longer 
times compared to the maximum for ti .2 is not unexpected, since, by construction, ti, 2 <t 2 -. 2 - A 
surprising element is that the histograms of ^ 2:2 have longer tais, which means that fluctuations 
play a more important role in the unfolding transitions that occur later in time. According 
to our formalism, the time-ordered data correspond to vectors of order statistics T'=[ti. 2 , ^ 2 : 2 ]^ 
described by the joint pdf Pt'(^ 1 : 2 , ^ 2 : 2 ) (see Eq. flA4j) ). 

The model for two-domain protein WW—WW has hve unknown parameters: the average 
values Pi and p 2 , the standard deviations an and (T 22 , and the covariance cri 2 . These describe 
the statistics of vector X=[xi,X 2 ]h Once determined, they can be used to describe the parent 
statistics of vector T=[ti,f 2 ]^- We employed the Maximum Likelihood Estimation (MLE) to 
obtain the vector of parameters, 6 ={pi, p 2 WiiW 22 Wi 2 )- The likelihood function for the pdf 
given by Eq. flA4j) is but in the model calculations we used the 

log-likelihood function, log[L( 6 ')]=^^f°^ logpT'({ti:n, t 2 :n}j\())■, to obtain the values of pi, p 2 , an. 


a 22 , and ai 2 . These quantities and Eqs. flA5l) . flAbll . flATIl . and flA 8 ll were then used to obtain 
the parent statistics: the average quantities pf and p^, the standard deviations af and aj, the 
skewness coefficients and 7 J, and the correlation coefficient (see Appendix). Finally, the 
closed-form expressions for the parent pdf’s of unfolding times Piif) and P2{t) were obtained by 
integrating out ^2 and ti, respectively, in Eq. (1A3I1 for the joint parent pdf: 

1 1 


Pi{t) = 


aii\p2^ L 


(\/t-Pi)\ , . (\/t + Pi 

exp(- —^exp(- 


2af. 


2ai. 


,* = 1,2 


( 6 ) 


Statistical measures of the parent unfolding times (parent statistics of vector T=[ti,f 2 ]^), pf, 
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/i|’, erf, erf, and obtained by applying Order statistics inference to the order-statistics data, 
are compared with the same quantities, obtained from a direct statistical analysis of the parent 
data, in Tables I and II. The theoretical curves of the parent distributions Pi(t) {i = 1,2), 
generated by substituting the values of pi, o'u and a 22 (statistics of vector X=[a;i, a; 2 ]^) into 
Eq.(j6]), are overlaid in Fig. 3 with the histograms of the parent unfolding times. 

We witness a very good agreement between the histogram-based estimates and theoretical 
curves of the pdf’s of the parent unfolding times for both domains WWi and WW 2 and for all 
force values /=80, 120, and 160pA^ in terms of the overall shape and position of the maximum 
(Fig. 3). In agreement with simulations, our theory predicts that at high forces the second 
domain {WW 2 ), to which the force is applied (Fig. 1), unravels on a faster timescale compared 
to the hrst domain {WWi). Hence, our theory captures the growing inequality of the time 
distributions for WWi and WW 2 at larger forces reflecting the mechanical symmetry breaking. 
Comparing the values of pf, af, ■jf, and p^, obtained using Order statistics inference, with the 
“true” values of these quantities from a direct statistical analysis, we see that the agreement 
is nearly quantitative for pf and af and very good for 'yf at low forces (/=80piV). At larger 
forces (>100pA^), the agreement for pj and aj is very good, yet, the agreement for 'yf is more 
qualitative rather than quantitative. Importantly, Order statistics inference correctly captured 
the mutual independence of unfolding times {ti and ^ 2 ) at low forces, which is reflected in small 
values of Pearson correlation coefficient p^ for /<140pA^. Order statistics inference correctly 
predicts the emergence of topological interactions at higher forces (i.e. for /=160pA^), for which 
the values of p^ are in the 0.1—0.3-range (Tables I and II). 


VI. DISCUSSION 


Single-molecule force spectroscopy has enabled researchers to uncover the mechanism of adap 


tation of protein structures to mechanical loads 


19 


4l| . These experiments are now routinly 


used to study proteins and other biomolecules beyond the ensemble average picture and to map 


the entire distributions of the relevant molecular characteristics 


42 


43|. Yet, existing theo¬ 


retical approaches for analyzing and modeling the experimental results lag behind. This calls 
for the development of next generation theoretical methods, which take into account both the 
complexity of the problem and the nature and statistics of experimental observables. 
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In the previos studies, we have demonstrated that the unfolding times observed in the constant 
force (force-clamp) measurements on a multidomain protein D 1 —D 2 — .. -—Dn comprize a set of 
the r-th order statistics, tr-n (^"=1,2 ,... ,n) [2^, and that the unfolding forces observed in the 
constant velocity (force-ramp) measurements form a set of the hrst order statistic ti-n in a 


sample of decreasing size (n, n — 1,..., 1) (27|. We have also developed rigorous statistical tests 
for classihcation of random variables measured in these experiments [^. Here, we developed an 
Order statistical theory to describe coupled proteins forming a multimeric protein or proteins 
forming the tertiary structure within the same multidomain protein subject to the mechanical 
stress. We focused on interesting new effects of tension-induced mechanical symmetry breaking 
and topological coupling in multi-domain proteins formed by the physically non-interacting 
protein domains, using an example of the two-domain protein WW—WW. However, our theory 
can also be used to characterize the physico-chemical properties of proteins with strong domain- 
domain interactions, such as hbrin hbers, microtubules, and actin hlaments mentioned in the 
introduction. Strong interactions will translate into large values of the off-diagonal elements of 
the covariance matrix (Pearson correlation coefficient). 

Our approach is based on Order statistics inference, which proved to be successful at solving 
the inverse problems. The approach can be used to accurately analyze, interpret, and model 
the results of protein forced unfolding measuremens available from the constant force assays on 
multimeric and multidomain proteins. One of the main results of this paper is that Order statis¬ 
tics inference enables one to analyze and model the forced unfolding times for a multi-domain 
protein formed by noninteracting identical domains {iid case) and non-identical domains [inid 
case), and by interacting identical domains {did case) and non-identical domains {dnid case). 
With little effort, the formalism can be extended to analyze the results of constant velocity 


measurements. We applied our approach to analyze the results o 
silico for the dimer WW—WW of the all-/9-sheet WW domains 44 


protein forced unfolding in 


45[ |. In simulations, one can 


access “the parent data” and “the time-ordered data”. This was used to compare directly the 
various statistical measures - the distributions of unfolding times, the average unfolding times, 
the standard deviations, and the skeweness of the distributions, obtained from the statistical 
analysis of the parent unfolding times for domains WWi and WW 2 , and the same measures, ob- 
taineed by applying Order statistics inference to the time-ordered data. A very good agreement 
obtained between the statistics of unfolding times from pulling simulations and from the theo- 
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retical inference validates our theory. The presense of correlations is reflected in the inequality 
of the joint distribution to the product of the marginals, i.e. ^ 2 )i^'Piif)'P 2 ii) |42|, 1^. In the 


bivariate case for just two domains, dynamic correlations between the parent unfolding times ti 
and ^2 are contained in the off-diagonal matrix element (7^2 of the covariance matrix S for vector 
X=[a;i,a; 2 ]^ Order statistics inference correctly detected the absence of correlations at a small 
force (/=80, 100 and 120piV), and the presence of correlations at a large force (/=160piV). 

We found the mechanical symmetry breaking and topological interactions observed in a multi- 
domain protein subject to tension. The growing asymmetry in the mechanical properties of 
otherwise identical protein domains comprising a multi-domain protein, can be understood by 
considering an example of WW—WW. When the pulling force is applied, it takes time Tf for 
tension to propagate from the tagged residue in domain WIV 2 to the other domain WWi. Hence, 
domain WW 2 is subjected to the mechanical force for a longer time than domain WWi (Fig. 1). 
When force is large enough so that the average unfolding time jUf becomes comparable with t/, 
tension propagation prolongs the lifetime of the more distal domain WWi, i.e. /ii(/)~/i/-|-r/ 
versus fi 2 {f)^f^f- This is reflected in the inequality of the distributions of unfolding times 
for domains WWi and WW 2 (Fig. 3). But this same result would have been observed for 
a different two-domain protein, say D 1 —D 2 , formed by the non-identical domains, e.g., the 
mechanically stronger domain Di and mechanically weaker domain D 2 , characterized by the 
differently distributed unfolding times. 

The topological coupling can be understood using a concept of correlation length. In the 
folded state, overall bending flexibility is determined by the mobility at the domain-domain 
interface. Under low tension, the average inter-domain angle 6 should show large deviations 
from the 180°-angle, A6, and the correlation length Ic is short. Under high tension, A6 decreases 
and Ic increases. This same result would have been observed for the domains that interact, e.g., 
through their inter-domain interface. Domain interactions would have decreased the mobility 
at the domain-domain interface, which, in turn, would have resulted in smaller A6 and longer 
If.. The averaged squared deviations of the inter-domain angle can be linked to the correlation 
length as {A9‘^)=2d/lc, where d is the inter-domain distance. We estimated Ic using the results 
of pulling simulations for WW—WW (linker of two residues). For /=100piV, d^l.2nm and 
|A6'|f«30°, and /c~1.8nm is shorter than the average size of the WW domain at equilibrium 
(Ri3—4nm). For /=160piV, d^2.8nm and |A6*|!^15°. This results in the three-fold increase in 
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lc^5.2nm, which now exceeds the dimension of WW domain. Hence, nnder high tension WW 
domains nnravel in a concerted fasion, which is reflected in the dependence of the unfolding 
times ti and t 2 at large forces (Tables I and II). 

To provide the reader with yet another glimpse of the hidden complexity underlying a seem¬ 
ingly trivial problem of unfolding of a multi-domain protein, we performed pulling simulations 
for the trimer WW—WW—WW , in which the force was applied to the third domain {WW^). 
The histograms of unfolding times for the first domain WWi (ti), second domain WW 2 (^ 2 ), and 
third domain WW^ (t^) are compared in Fig. 5. Numerical values of the correlation coefficients 
for pair-wise correlations pjj, Z 7 ^j=l, 2 , 3 , of the unfolding times are summarized in Table III. 
We see that the distributions of unfolding times are nearly identical and unimodal, reflecting 
the mechanical symmetry, at a low lOOpWforce; yet, the time distributions become increasingly 
more different with the increasing tension. Indeed, the distributions of unfolding times for WWi 
and WW 2 , but not for WW 2 , develop the second mode at /=140piV. The parent unfolding times 
for WWi (ti) and WW 2 (^ 2)5 but not for WWi and WW 3 and not for WW 2 and WW^, develop 
correlations, which tend to grow with force (Table III). 


VII. CONCLUSION 


Intramolecular and intermolecular interactions involving proteins are at the core of virtually 
every biological process. Here, we developed and tested a new Order statistics approach for 
detecting and describing biomolecular interactions. Order statistics offers a new tool kit for 
accurate interpretation and modeling of experimental data on multidomain proteins, multimeric 
proteins, and engineered polyproteins available from single-molecule force spectroscopy. Looking 
into the future, we anticipate that Order statistics based calculus of probability will play an 
important role when single-molecule techniques will expand into new avenues of research such as 


folding of multi-domain proteins protein folding in cellular environment, and nanomechanics 
of protein assemblies (protein fibers, microtubules, actin filaments, viruses, etc.) Also, 

life processes in living cells are coordinated both spatially and temporally. In this respect. Order 
statistics builds in causality into a theoretical description in a natural way. 

Mechanical symmetry breaking and topological coupling might be related to the biological 
utility of proteins. The structural design of multidomain proteins - number of domains, linker 
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length, etc., carves out a larger “parameter space” for the physical characteristics such as me¬ 
chanical strength, tolerance to fluctuations, correlation length, all of which are force-dependent. 
As we showed in the paper, when tension is high enough, this design permits the “mechanical 
differentiation” of protein domains, which become, in some sense, “functional isoforms” of the 
same structural unit. Protein systems may have evolved to select certain modular architec¬ 
tures with the optimal physico-chemical properties for efficient mechanical integration of forces. 
Also, the mechanical stress promotes coupling between structural elements, which transforms 
a mechanically non-cooperative system into a highly cooperative one. This might be a mech¬ 
anism of information transfer over long distances to coordinate cell processes over the relevant 
lengthscales and timescales |l0 |. 
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APPENDIX: ORDER STATISTICS INFERENCE FOR TWO-DOMAIN CASE 


In the case of a two-domain protein D 1 —D 2 , the unfolding time data can be represented by 

vectors T'=[ti: 2 , ^ 2 : 2 ]^ containing the hrst and second order statistic. These data can be used to 

gather information about parent vectors T=[ti, ^ 2 ]^ for the hrst domain {Di) and second domain 

{D 2 ). For each vector T we dehne vector X^=[a;^, in which random variables xi and X 2 are 

sampled from the Gaussian distribution (see Eq. ([T])). Statistics of Xi and X 2 are described by 

r 2 "1 

the vector of the average /i=[/ri,/i 2 ]^ and the covariance matrix S= ^ 2 ^ , where and cr |2 
are the variances of xi and X 2 , respectively, and cri 2 =cr 2 i is the covariance. Eq. (jS]) for the joint 
pdf of the parent data T=[ti,t 2 ]^ becomes 

1 1 


Pt(HG2) — 


27rVls| 22^/HTt: 


■E 


, - i (s^/T-At) t S-1 (sv/T-m) 


(Al) 


where |S|=cr^^cr 22 —o‘i 2 is the determinant and 
ance matrix S. In Eq. flAip . s = [si, ^ 2 ]^ = {[1, l]i, [1, —1]^, [—1,1]^, [—1, —1]^} are vectors of 
all possible signs of y/ti and ^/t 2 and s\/T=[si\/7i, S 2 ^/h\^ is the vector of inverse values, which 
correspond to all possible branches of the inverse transformation \ti^t 2 \ = [x\,x‘^. Performing 
vector multiplication in the exponent of the exponential function forming the sum in Eq. flAip . 


^22 


— (712 
^11 


is the inverse of the covari- 
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we obtain: 


(s^/T-/i)t S-i (s\/T-/i) 


(si\/^ — fiiYa22 + (52\/^ — A''2)^0'ii ~ 2(si-y/t7 — /il)(s2-\A2 ~ /^2)o'l2, 


2 2 2 
<^11 <^22 ~ *^12 


^2) 


which, depending on the choice of s=[si,S 2 ]^ takes the following form: 

, ^ \ (\/^-/ii)V|2 + (v^-/i2)Vfi-2(\/^-/ii)(V^-/i2)cri2 ^ .p 

Kl[ti,t2) - --T-^-^2 ^2 N-’ ® - I-*-’ ■‘■J 

^1*^12 '^11*^221 


k2{tl,t2) = 


(\/tl “ Ail )^<^22 + (\/^ + h' 2 )^Crii + 2 (v^ — /il)(-y/i 2 +/i 2 )cri 2 -.i 

2 (af 2 - crf^ai^) ’ ' “ ^ ^ 


1 (+ + \ (\/^ + /^l )^<^22 + (\/^ “ + 2 (v^ +/il)(\/^ — P- 2 )a‘l 2 r 1 lit 

t 2 ) = - 7 ^ - 2 “^-T - , S = [-1, IJT 


hihyh) = 


2{af2 - aW22) 

{y/tl + fii)‘^a22 + (\/t2 + h'2)^cril ~ 2(y^ + /il)(\/f2 + /^2)c’‘l2 


> s = [-1,-1] 


t 


2(a?2 - aW22) 

Snbstitnting these expressions for kj=kj(ti,t 2 ) (j=l,2, 3,4) and the expression for |S| in Eq. 
lED, we obtain the closed form expression for the joint pdf of the parent data: 

4 


PT(ti, t2) - -—===== • 77== ■ ' 

27rVcr7(j4 - 0-f2 4Vti • 7 




(A3) 


in which the snmmation over s is replaced with the summation over j. 

Next, to obtain the closed form expression for the joint pdf of the order statistics, we use 
Eq. (jl]) and take into account possible permutations of the vector components of order statistics 
[^k(1):2, ^k( 2):2]^- In the 2D-case, there are only two options, namely vector [^ 1 : 2 ,^ 212 ]^ and vector 
[t 2 : 2 ,ti: 2 ]^ whlch correspoud to permutations (^ 1 ,^ 2 ) and (^ 2 ,^ 1 ) of the parent data. Then, the 
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joint pdf of the order statistics is given by 


Pt'(^1;25 ^2:2) — Pt(^1 5 ^ 2 ) + Pt(^ 2, ^l) 
1 1 


4\/^i • t2 27r^/i 


afiCT|2 - crf2 


X 


exp( 

exp( 


iVh — hl)^cr22 + iVh — ^ 2 ) 


2^2 


‘a 


11 


2(\/^i — ~ h2)o■l2^ 


2{al,-alal,) 

iVti — ^J-iYo'22 + {y/t2 + p.2)^0'll + 2,{y/ti — /il)(\/f2 + A'-2 )o'12 ^ 


2 (a 


12 '^11*^22 


exp( 

exp( 

exp( 

exp( 

exp( 

exp( 


(\/tl + /^l)^C’‘22 + (\/^ ~ h2)^0'll + 2(\/ti' + /il)(\/t2 ~ h2)c’‘l2 > 

2(af2 - ' 

(\/^ + hl)^'^22 + iVh + A''2)^Crii 


2(\/^ + /^l)(\/^2 + /^2)0‘12< 


2(a?2 - a?,ai2) 

/Xl)V22 + (\/il - /^2)^cr^l - 2(v^- Pi)(v^- /X2)cri2, 

2(a?2 - crf,ai2) 

iV^ ~ hl)^'^22 + (\/^ + h2)^<^ll + 2(\/f2 ~ hl)(\/^ + h2)<^12 ^ 


2 (a 


12 *^11 <^22 


(\/^ + hl)^'^22 + (V^ ~ h2)^0'll + 2(\/t2 + hl)(\/^ ~ h2)c’'l2 > 

2(af2 - crfiai2) ' 

iVh + hi)^cr22 + (\/^ + h2)^crii 


2(\/^ + hl)(\/^ + h2)c’‘l2' 


(A4) 


2(a?2 - alai2) 

To obtain the average values pi and ii 2 , the standard deviations cth and cr 22 , and the covariance 
(Ti 2 , we used the Maximum Likelihood Estimation method (Section V) and the joint pdf of the 
order statistics (see Eq. flA4p 1. These parameters correspond to the vector X=[a;i,X 2 ]^ which 
allows us to estimate the average value (/if), the standard deviation (af), the skewness of the 
distribution (yf), and the correlation coefficient (p^) for the parent data T=[/i,/ 2 ]^- 

The theoretically estimated average unfolding times for domains Di and D 2 are given by 

pf = EiU) = Eixj) 

= p? + af, z = l,2 (A5) 

where E(ti) denotes the expected value of ti and E{xf) is the second moment of Xj. The standard 
deviations are given by 

= \/Var{ti) = ^Jvar{x‘f) 

= ^E{xt) - (E(x?))2 

(A6) 


Cl, 


= \/44/^? + 2i^f, i = l,2 
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where Var{ti)=Var{xj) is the variance of the parent unfolding times ti=xf, and Ei^xf) and 
E{xf) are the second and fourth moments of Xi, respectively {i=l,2). The skewness of the 
distributions can be estimated as 


T ^ Ejti) - 
^ E{xl) - 3/if 

^ 24/x^af^ + Safi ■ ^ ^ 


(A7) 


The correlation coefficient of unfolding times ti and t 2 , p^=pij (*7^j=l,2) is dehned as 

T _ E(tit2) — E(ti)E(t2) 

^ ^JVar{ti)^JVar{t2) 

^ E{x\xl) - E{xi)E{xl) 

\JVar{x\)yJVar{x2) 

In order to calculate E{x1xl) - the expected value of x1 and x^, we rewrite X 2 in the form 
X 2 = P 2 + P^{xi — Pi) + <^22^1 — P^R, where i? is a normal random variable, independent of 
Xi, with zero mean and unit variance, and p is the correlation coefficient for xi and X 2 , given by 
P = 0 - 12 /(o'! 1 (^ 22 )- Then, E{x\xl) becomes E{x\xl) = E{x\{p 2 +p^{xi-pi) + a 22 \/^ - P^R)"^)- 
Simplifying the expression for E{x\x^ we obtain: 



where the coefficient A, the third moment E{x'^ of Xi, and the fourth moment E(xf) of Xi are 
given, respectively, by 


A 


E(xf) 


<^12 

P2 - 


p\ + 3/iiCr^^, 


E{x\) = p\ + ^p\(yli + 3crfi 


(AlO) 


The correlation coefficient for the parent data can be calculated theoretically by substituting 
Eqs. (lAlOl) into Eq. (lAQIi . then substituting Eq. (lA9p into the second line in Eq. (lASli . and 
hnally using Eqs. flA5l) and flA6l) for the average unfolding time pf=E{x‘f) and the variance 
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aJ=^Var[xX) (i=l,2). 
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FIGURE CAPTIONS 

Figure 1: Schematic representation of the native strnctnre of dimer WW—WW, formed by 
the C-terminal to N-terminal connected all-/3-sheet WW domains (PDB code: IPIN). The hrst 
WIV domain, denoted as WWi, is shown in bine, the second WW domain, denoted as WW 2 , 
is shown in red, and the two-residne linker is shown in yellow color. In pnlling simnlations, the 
constant force / is applied to the C-terminns of domain WW 2 in the direction coinsiding with 
the end-to-end vector of dimer WW—WW; the N-terminns of domain WWi is constrained. 
Strnctnral analysis of nnfolding trajectories revealed that the force nnfolding transitions from 
the native folded state (F) to the nnfolded state (U) in each WW domain occnr in a single step, 
F^U. Shown are the two possible scenarios. In the hrst pathway (left), WWi nnfolds hrst (1-st 
nnfolding time 11 - 2 ) and WW 2 nnfolds second (2-nd nnfolding time ^ 2 : 2 )- In the second pathway 
(right), WWi nnfolds second (^ 2 : 2 ) and WW 2 nnfolds hrst (^ 1 : 2 )- 

Figure 2: The time-evolution of the end-to-end distance of the dimer WW—WW (see Fig. 1), 
Y, under the inhuence of constant pulling force of /=100piV (panel a) and /=160piV (panel 
b). Shown in diherent color are a few representative trajectories. The unfolding transitions in 
WW—WW are rehected in the stepwise increases in Y, which occur at the 1-st unfolding time 
ti :2 (unfolding of WWi or WW 2 ), and 2-nd unfolding time ^ 2:2 (unfolding of WWi or WW 2 ). 
These transitions are magnihed in the insets for each force value for just one simulation run. 

Figure 3: The “parent unfolding times” ti, i=l, 2, for the dimer WW—WW of domains WWi 
and WW 2 connected by the two-residue linker (panels a—f) and four-residue linker (panels g—l). 
Shown are the histogram-based estimates of the pdf’s of unfolding times for the hrst domain 
WWi (ti, blue bars) and for the second domain WW 2 (^ 2 , red bars), obtained directly from 
the simulation output. These are compared with the theoretical curves of the same quantities 
obtained by applying Order statistics inference (black curves) for f=80pN (panels a, b and g, 
h), /=120piV (panels c, d and i, j), and /=160piV (panels e, / and k, 1). 

Figure 4: The “time-ordered unfolding times” tr: 2 , r=l, 2 for the dimer WW—WW (two- 
residue linker). The data are represented by the histogram-based estimates of the pdf’s of the 
1-st unfolding time (ti: 2 ) and 2-nd unfolding time (^ 2 : 2 ) compared for f=80pN (panels a and b), 
f=120pN (panels c and d), and /=160piV (panels e and /). 
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Figure 5: The “parent unfolding times” ti, i=l,2 and 3, for the trimer WW—WW—WW 
of WW domains connected by linkers of two residues. Compared are the histogram-based 
estimates of the parent pdf’s of unfolding times for the hrst domain WWi (ti, blue bars), 
second domain WW 2 (^ 2 , red bars), and third domain WW 3 (t^, green bars), obtained directly 
from the simulation output generated for /=100piV (panels a, b, and c), /=140piV (panels d, e, 
and /), and /=160piV (panels g, h, and i). 

Table I: Statistical measures of the parent unfolding times for the hrst domain WWi {ti), and 
second domain WW 2 (^ 2 ), connected in the dimer WW—WW by the two-residue linker: the 
average unfolding times and the standard deviations aj and aj, the skewness of the 
distributions 7 ^ and 'yj, the Pearson correlation coefficient ^rid Spearman rank correla¬ 

tion coefficient s^=s^ 2 - The estimates of these measures (except for s^), obtained directly from 
the simulation output, are compared with the estimates obtained by applying Order statistics 
inference (shown in parentheses). 


Force, 

T 

hi, 

T 

h 2 , 

T 

, 

T 

T 

Ti 

T 

72 

T 

p 

T 

S 

pN 

ms 

ms 

ms 

ms 






174.2 

106.1 

134.4 

102.3 

1.69 

1.51 

- 0.012 

0.006 

80 

(189.8) 

(93.5) 

(143.7) 

(88.5) 

( 1 . 2 ) 

(1.55) 

(-0.097) 



4.13 

2.25 

3.60 

2.25 

1.41 

1.67 

-5-10-® 

0.025 

100 

(4.19) 

(2.09) 

(3.77) 

(1.87) 

(1.46) 

(1.67) 

(-0.067) 



0.32 

0.13 

0.23 

0.12 

1.71 

1.98 

-0.064 

-0.049 

120 

(0.31) 

(0.13) 

(0.23) 

(0.08) 

( 1 . 20 ) 

(0.98) 

(- 0 . 111 ) 



0.093 

0.034 

0.031 

0.019 

1.21 

2.01 

0.014 

0.132 

140 

(0.095) 

(0.032) 

(0.024) 

(0.016) 

(0.39) 

(0.76) 

(0.174) 



0.058 

0.016 

0.010 

0.006 

1.18 

1.86 

0.131 

0.123 

160 

(0.057) 

(0.016) 

(0.009) 

(0.005) 

(0.24) 

(0.51) 

(0.151) 
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Table II: Same quantities as in Table I but for the dimer WW—WW of WW domains connected 
by the linker of four residues. 


Force, 

T 

hi, 

T 

T 

T 

T 

7i 

T 

72 

T 

p 

T 

S 

pN 

ms 

ms 

ms 

ms 






166.0 

112.9 

133.5 

99.6 

1.74 

1.35 

0.028 

0.037 

80 

(150.0) 

(103.8) 

(125.4) 

(94.3) 

(1.34) 

(1.47) 

(0.077) 



3.80 

2.46 

3.30 

2.30 

1.42 

1.62 

0.030 

0.052 

100 

(4.04) 

(2.52) 

(3.59) 

(1.97) 

(1.44) 

(1.24) 

(0.048) 



0.32 

0.15 

0.23 

0.12 

1.68 

1.73 

-0.026 

-0.033 

120 

(0.30) 

(0.16) 

(0.23) 

(0.10) 

(1.20) 

(0.96) 

(-0.004) 



0.10 

0.040 

0.036 

0.023 

1.42 

1.55 

-0.017 

0.069 

140 

(0.10) 

(0.039) 

(0.032) 

(0.021) 

(0.46) 

(0.83) 

(-0.266) 



0.065 

0.021 

0.011 

0.008 

1.19 

1.00 

0.259 

0.255 

160 

(0.064) 

(0.021) 

(0.011) 

(0.008) 

(0.25) 

(0.55) 

(0.285) 



Table III: The Pearson correlation coefficients, pfj = pf 2 , pfs, and p '^21 ^^e Spearman rank 

correlation coehcients, sf^ = 5 ^ 3 , sfg, and S 22 , quantifying the degree of pairwise correlations 
(dependence) of the parent unfolding times U (i=l,2,3) for the hrst domain WWi (fi), second 
domain WW 2 (^ 2 ), and third domain WW 3 (fs). These measures are obtained by performing a 
statistical analysis of the simulation output for the trimer WW—WW—WW of WW domains 
connected by the linkers of two residues. 


Force, pN 

pU 

4) 

Pr3(4) 

pUsI) 

100 

-0.023 ( 

;0.032) 

0.053 

(0.050) 

0.038 (0.035) 

120 

-0.073 ( 

-0.130) 

-0.035 

(-0.039) 

0.054 (-0.032) 

140 

-0.341 ( 

-0.295) 

0.042 

(0.057) 

0.043 (0.092) 

160 

-0.537 ( 

-0.139) 

0.038 

(0.021) 

-0.016 (-0.032) 
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FIG. 1: (Olga Kononova, Lee Jones, Valeri Barsegov) 














Time, ms 


Time, ms 


FIG. 2: (Olga Kononova, Lee Jones, Valeri Barsegov) 

































FIG. 3: (Olga Kononova, Lee Jones, Valeri Barsegov) 






































































FIG. 4; (Olga Kononova, Lee Jones, Valeri Barsegov) 
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FIG. 5: (Olga Kononova, Lee Jones, Valeri Barsegov) 































































